-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-26570][statefun] Remote module configuration interpolation #309
base: master
Are you sure you want to change the base?
[FLINK-26570][statefun] Remote module configuration interpolation #309
Conversation
3d3ff6f
to
4f63228
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @FilKarnicki. I like the flexibility it gives to the user. The main thing I am asking myself is whether it gives users too much power so that they risk shooting themselves into the foot. I don't have a perfect answer for it yet.
1. System properties | ||
2. Environment variables | ||
3. flink-conf.yaml entries with prefix 'statefun.module.global-config.' | ||
4. Command line args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more common to give env variables precedence over flink-conf.yaml
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in principal. That said, because globalConfiguration
is already an combination of args and flink-conf.yaml entries with the statefun.module.global-config.
prefix, there's no easy way to put env variables in between them without affecting other parts of the system.
I made a start in my fork and the number of changes is pretty high for what we gain. Please have a look and let me know if I Should working on the change you mentioned in this comment
kind: io.statefun.endpoints.v2/http | ||
spec: | ||
functions: com.example/* | ||
urlPathTemplate: ${FUNC_PROTOCOL}://${FUNC_DNS}/{function.name} | ||
--- | ||
kind: io.statefun.kafka.v1/ingress | ||
spec: | ||
id: com.example/my-ingress | ||
address: ${KAFKA_ADDRESS}:${KAFKA_PORT} | ||
consumerGroupId: my-consumer-group | ||
topics: | ||
- topic: ${KAFKA_INGRESS_TOPIC} | ||
(...) | ||
properties: | ||
- ssl.truststore.location: ${SSL_TRUSTSTORE_LOCATION} | ||
- ssl.truststore.password: ${SSL_TRUSTSTORE_PASSWORD} | ||
(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the flexibility the templating mechanism gives to the user. The thing I am asking myself is whether it gives too much power so that the user can shoot himself into his foot.
The danger I see is that we add level of indirections that make it harder to reason about what the effective yaml will look like for a user. The underlying problem the issue wants to solve is to pass in information that is only available to the process that runs SF but not the user (e.g. secrets). I am wondering whether there is an alternative to achieve the same but with a bit less power (e.g. allowing substitution only for selected fields (also confusing)). I don't have a perfect answer here. I mainly wanted to hear your and @igalshilman's opinion here.
Maybe one idea could be to log the effective/resolved yaml somewhere so that the user sees what is actually run, if this does not happen already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of logging the effective yaml somewhere. We should probably make it an opt-in kind of a deal, since we don't want to be automatically logging secrets. I'll hold off on coding this until we hear from @igalshilman
I'm beginning to think that we can achieve the same with a 'jar -uf' at deploy time. This means that this PR can be closed, do you agree @igalshilman / @tillrohrmann ? |
I still see value in this PR. When deploying into a managed environment such as Kinesis Data Analytics on AWS it is not possible to use something like 'jar -uf' to modify the jar at deploy time. We see value in being able to pass a hostname for API Gateway as an environment variable, for instance, to simplify multi environment deployment in an AWS CDK stack. |
What is the purpose of the change
The goal of this PR is to add the ability to define ${placeholders} in the module.yaml's
spec
object, e.g.:These placeholders are then replaced using
where (4) override (3) which override (2) which override (1).
Main changes are
RemoteModule
resolves placeholders by traversing through thespec
json nodeComponentJsonObject
has a new constructor that allows the use of resolvedspec
json node (note:rawObjectNode
still contains unresolved ${placeholders}, but does not seem to be used anywhere)Verifying this change
Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @public(Evolving): N/A
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no